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, ^ Abstract 

We study the properties of false discovery rate (FDR) thresholding, viewed as a classifi- 
ed cation procedure. The "0" -class (null) is assumed to have a known, symmetric log-concave 

density while the "1" -class (alternative) is obtained from the "0" -class either by transla- 

Pq tion (location model) or by scaling (scale model). Furthermore, the "1" -class is assumed 

to have a small number of elements w.r.t. the "0" -class (sparsity). Non-asymptotic oracle 
inequalities are derived for the excess risk of FDR thresholding. In a regime where Bayes 
power is away from and 1, these inequalities lead to explicit rates of convergence of the 
excess risk to zero. Moreover, these theoretical investigations suggest an explicit choice 
, for the nominal level am of FDR thresholding, in function of m. Our oracle inequalities 

show theoretically that the resulting FDR thresholding adapts to the unknown sparsity 
regime contained in the data. This property is illustrated with numerical experiments, 
which show that the proposed choice of am is relevant for a practical use. 

\Q 1 Introduction 

^ 1.1 Background 

^ The false discovery rate (FDR) has become a standard for analyzing many types of data, 

^ such as microarray or neuro-imaging. Albeit motivated by pure testing considerations, recent 

studies have shown that the Benjamini Hochberg FDR controlling procedure proposed by 
[ ] enjoys remarkable properties as a detection procedure [ ] and as an estimation procedure 
[1, 10]. More specifically, it turns out to be adaptive to the amount of "signal" contained in 
the data, which has been referred to as "adaptation to unknown sparsity" . 

Recently, an important theoretical breakthrough has been made with the study of FDR 
thresholding in a classification framework, where asymptotic results were proved in a Gaussian 
scale model [b] (see also [17] and [7]). The present paper extends this work by studying the 
adaptation to unknown sparsity of FDR thresholding non-asymptotically and in more general 
models (location/scale models with symmetric log-concave densities, see Section 1.6 for a 
detailed comparison to [ ]). 



1.2 Initial setting 

Let (X^, Hi) G Mx {0, 1}, 1 < z < m, be m i.i.d. variables. Assume that the sample Xi, 

is observed without the labels and that the distribution of Xi conditionally on 

Hi = is known a priori. We consider the following general classification problem: build a 
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(measurable) classification rule hjn : M ^ {0, 1}, depending on Xi, X^, such that, for a new 
labeled data point (X^+i,i7^+i) ^ {Xi,Hi) independent of (X^, i7^)i<-^<^, the (integrated) 
misclassification risk 

Rm{hm) = nhm{Xm+l) + Hm^x) (1) 

is as small as possible. 

The distribution of (Xi,i7i) is assumed to belong to a specific parametric subset of dis- 
tributions on R X {0, 1}, which is defined as follows: 

(i) the distribution of li\ is such that the (unknown) mixture parameter = 7ro,m/^i,m 
satisfies > 1, where 7ro,m 0) and tti^^ = 1) = 1 — 7ro,m- 

(ii) the distribution of X\ conditionally on i7i = has a density (i(-) w.r.t. the Lebesgue 
measure on R of the form d{x) — e~^(l^l) for a known function satisfying 

6 : R+ ^ R is C increasing; and convex on R+ with 4 e-^(l^l)dx = 1. (A((/))) 

(iii) the distribution of X\ conditionally on i7i = 1 has a density di,m(') w.r.t. the Lebesgue 
measure on R of either of the two following types: 

- location: di,rn(^) — d(x — jirn)^ foi" an (unknown) location parameter jir^ > 0; 

- scale: (ii,m(^) = d{x / cjm) / (Jm^ for an (unknown) scale parameter dm > 1- 

An important point in our setting is that the parameters — {Tmil-^m) in the location model, 
or {Tm^cTm) in the scale model — are assumed to depend on sample size m. More precisely, 
the parameter r^, called the sparsity parameter, is assumed to tend to infinity as m tends to 
infinity, which means that the unlabeled sample only contains a small, vanishing proportion 
of label 1. This condition is denoted (Sp). As a counterpart, the other parameter — /x^ in 
the location model, or am in the scale model — is assumed to tend to infinity fast enough 
to balance sparsity. This makes the problem "just solvable" under the sparsity constraint. 
More precisely, our setting corresponds to the case where the Power of Bayes procedure is 
away from and 1, and is denoted (BP). 

This setting is motivated by practical situations such as source detection in astronomy or 
DNA copy number studies in biology, where the resolution of a measurement device increases, 
while the observed phenomenon is localized and has a fixed signal strength. When increasing 
the resolution m, the proportion vri^^n of active loci decreases while the signal to noise ratio 
of (some of) the active loci increases (i.e., these loci are generated from a model with an 
increasing parameter fim or (Jm)- 

Assumption (A((/))) sets a condition on d{x) = e^^^'^'^ slightly stronger than "d is sym- 
metric log concave". Namely, it also entails that d{-) is decreasing on R+. In the location 
model, this is essential to get a monotonic likelihood ratio, as we will see below. Also, this 
assumption is convenient to get expressions for tails and quantiles related to the distribu- 
tion induced by (i(-), see Appendix A. Throughout the paper, a leading example of density 
satisfying (A((/))) is the so-called ("-Subbotin density, C ^ 1? defined by 

/+CX) 
e-l^l^/^dx, (2) 
-OO 

that is, d{x) = e"'^^!^!) with (f){u) = /( + log{L(^). The particular values = 1, 2 give rise to 
the Laplace and Gaussian case, respectively. The classification problem under investigation 
is illustrated in Figure 1 (left panel), in the Gaussian location case. 
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1.3 FDR thresholding 

Classically, the solution that minimizes the misclassification risk (1) is the so-called Bayes 
rule that chooses the label 1 as soon as di^rnix)/d{x) is larger than a specific threshold. 
Assuming (A(0)), the likelihood ratio di^rn{x) / d{x) is nondecreasing in x and \x\ for the 
location and the scale model, respectively. In the location case, this comes from 0(|x|) — 
(/)(|x — /j.m\) being nondecreasing in x G M (because is convex increasing). In the scale 
case, this results from (j)(u) — (j){u/ajn) being increasing in u ^ (because (j) is convex). 
As a consequence, we can only focus on classification rules hm{x) of the form l{x > 5^}, 
Sm ^ IR, for the location model, and l{\x\ > 5^}, Sm G M^, for the scale model. Therefore, 
thresholding procedures are classification rules of primary interest, and the main challenge 
consists in choosing the threshold Sm in function of Xi, 

The FDR controlling method proposed in [z] (also called "Benjamini-Hochberg" thresh- 
olding) provides such a thresholding Sm in a very simple way once we can compute the quantile 
function D (•), where D{u) = e^^^'^'^dx is the (known) upper-tail cumulative distri- 
bution function of Xi conditionally on Hi = 0. In the location model, FDR thresholding is 
defined as follows: 

Algorithm 1.1. 1. choose a nominal level am G (0, 1); 

2. consider the order statistics of the Xj^ 's: > X(2) ^ ••• ^ -^(m)5 

3. take the integer 

k — max{l < k < m : > D (amk/m)} 
when this set is non-empty and A: = 1 otherwise; 
I use h^^^{x) = l{x > 5™} for 5™ = D~\amk/m). 

For the scale model, FDR thresholding has a similar form: h^^[x) — l{|x| > s^^} 
for 5™ = D~^{amk/{2m)), where k = max{l < k < m : \X\^j,^ > {amk / {2m))} 
(A: = 1 if the set is empty) and |-^|(i) > 1-^1(2) ^ ••• ^ |-^|(m)- Algorithm 1.1 is illustrated 
in Figure 1 (right panel), in a Gaussian location setting. Note that taking k — 1 whenever 
the set {1 < /c < m : > D {amk/m)} is empty does not correspond to the original 

formulation of [ ], as they choose /c = in that case. This modification is required to tackle 
the "hyper-sparse" setting where oc m (as explained in Section 6.1, it does not change the 
corresponding multiple testing procedure). Finally, the FDR procedure depends on a tuning 
parameter a^ ^ (0, 1) which should be chosen carefully, as we will explain further on. 

1.4 Aim and scope of the paper 

In this paper, we aim at studying the performance of FDR thresholding as a classification 
rule in terms of the excess risk Rjn{h^^) — Rm{h^) both in location and scale models. We 
investigate two types of theoretical results: 

(i) Non- asymptotic oracle inequalities: prove for each (or large) m, an inequality of the 
form 

^m(/^™) - Rm{h^) < 6(0, m, am, Tm), (3) 

where 6((/), m, am, Tm) is an upper-bound (depending on additional constants), which we 
aim to be "as small as possible" . 
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Figure 1: Left: illustration of the considered classification problem for the Gaussian location 
model; density of A/'(0, 1) (solid line); Xj^, A: = 1, ...,m (crosses); a new data point X^+i to 
be classified (star); Bayes rule (dotted line); FDR rule s^^^ for a = 0.3 (dashed hne). Right: 
illustration of the FDR algorithm for a = 0.3; k G {1, m} $ ^{ak/m) (solid line); 
(crosses); s^^^ (dashed horizontal line); /c = 6 (dashed vertical line). Here, $(x) = P(X > x) 
for X A/'(0, 1). m = 18; jj^m = 3; = 5. For this realization, 5 labels "1" and 13 labels 
"0". 



(ii) Convergence rate: find a sequence (arn)m such that there exists D > such that for all 
m > 2, 

Rmihi'''') - Rm{h^) <Dx Rm{hg) X p^, (4) 
for a given rate pm = o{l). 

The property (4) is called "optimal at rate p^n"- It implies that Rm(h^^) ^ Rm{h^)^ 
that is, is "asymptotically optimal", as defined in [)]. However, (4) is substantially 

more informative because it provides a rate of convergence. 

We should emphasize at this point that the trivial procedure — (which always chooses 
the label "0") satisfies (4) with = 0(1) (under our setting (BP)). Therefore, proving (4) 
with = 0(1) is not sufficient to get an interesting result and our goal is to obtain a rate 
Pm that tends to zero within (4). The reason for which is already "competitive" is that 
we consider a sparse setting where label "0" is produced with high probability. 

1.5 Overview of the paper 

First, Section 2 presents a more general setting than the one of Section 1.2. Namely, the 
location and scale models can be seen as particular cases of a general "p- value model" after a 
standardization of the original X^'s into values piS. The so-obtained p- values are uniformly 
distributed on (0, 1) under the label while they follow a distribution with decreasing density 
fm under the label 1. Hence, procedures of primary interest (including Bayes rule) are value 
thresholding procedures, that choose the label 1 for values smaller than some threshold t^. 
Throughout the paper, we focus on this type of procedures, and any procedure hm is identified 
by a threshold tm in the notation. Translated in this "p- value world" , we describe in Section 2 
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Bayes rule, Bayes risk, condition (BP), pFDR and FDR thresholding. pFDR thresholding, as 
proposed by [ u], can be seen as a theoretical substitute to FDR thresholding. It is extensively 
used in our approach. 

The fundamental results are stated in Section 3 in the general p-value model. As pFDR 
thresholding is much easier to study than FDR thresholding from a mathematical point of 
view, our approach is first to state an oracle inequality for pFDR, see Theorem 3.1, and 
second to use a concentration argument of the FDR threshold around the pFDR threshold 
to obtain an oracle inequality of the form (3), see Theorem 3.2. At this point, the bounds 
involve quantities which are not written under an explicit form, and which depend on the 
density of the values corresponding to the label 1. 

The particular case where fm comes from a location or a scale model is investigated in 
Section 4. For this, an important property is that under (A(0)), the upper-tail distribution 
function D(-) and the quantile function D (•) can be bounded in function of (i(-), 0, (j)' and 
(/)~^, see Appendix A. By using this property, we derive from Theorems 3.1 and 3.2 several 
inequalities of the form (3) and (4). In particular, in the sparsity regime Tm = < /3 < 1, 
and for a ("-Subbotin density given by (2), we derive that the FDR threshold i^^^ at level 
am is asymptotically optimal (under (BP) and (Sp)) in either of the two following cases: 

- for the location model, > 1, if am and logo:^ = o ((logm)-^"-^/^); 

- for the scale model, > 1, if am and logo:^ = o (logm). 

Furthermore, choosing am oc l/(logm)-'^~-'^/^ (location) or am oc l/(logm) (scale) provides a 
convergence rate pm = l/(logm)-'^~-'^/^ (location) or pm = l/(logm) (scale), respectively. 

At this point, one can argue that the latter convergence results are not fully satisfactory: 
first, these results do not provide an explicit choice for am for a given finite value of m. 
Second, the rate of convergence pm being rather slow, we can legitimately ask whether a 
faster rate can be obtained. Third, we should check numerically that FDR thresholding does 
significantly better than null thresholding for a moderately large m. 

First, we address the choice of am by carefully studying Bayes thresholding and how it is 
related to pFDR thresholding, see Sections 2.5, 4.1 and 5.2. More precisely, let us consider 
the sparsity regime Tm = P G for < /3_ < /3+ < 1. Also, assume that the power 

Cm of Bayes rule lies in the range C+] for < C- < C+ < 1. Then, our recommendation 
is to choose /3o = (/3- + /3+)/2, Co = (C_ + C+)/2 and 




for the location model, C > l5 



(5) 




for the scale model, C ^ 1- 



(6) 



In particular, the cases (" = 1,2 give rise to the following choices: 



at'^Wo, Co) = {l + Co e^o/2^4^^Qiog^|- 
a^^(/3o,Co) = {l + Co/3o^/2^e(^o)V2(^/)-iiog^| 
<^(/3o,Co) = {l + /3o(log(l/Co))-^logm}-' 
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(Gaussian location, C, 



(Gaussian scale, Q 



(Laplace scale, C, 



2); 



1), 



where and Zq denote the quantiles of order 1 — Co and 1 — Co/2 of a standard Gaussian 
variable. More specifically, am given by either (5) or (6), denoted ^^(^So^Co) for short, is 
derived as an equivalent as m tends to infinity of a quantity q:^^(/3o,Co) that enjoys some 
optimality property for the pFDR threshold when the model parameters are (/3o,Co), see 
Sections 2.5 and 4.1. While a^^(/3o,Co) and q:^(/3o,Co) behave similarly for large m (say, 
m > 1000), it is better to use am^{/3o^Co) for small values of m (say m < 100). However, 
the level a^^(/3o, Co) has a less explicit expression and should be computed numerically, see 
Section 5.2. 

Second, to address the rate issue, we provide a lower bound for the Laplace scale model 
in Section 4.4. More precisely, we show in that case that the rate of convergence of pFDR 
thresholding cannot be faster than l/(logm) for several values of /3 at a time (see Corol- 
lary 4.6). This means that the rate derived by our methodology is the correct one (at least 
for the pFDR and in the Laplace case). 

Third, in Section 5, the performance of FDR thresholding (choosing am as suggested 
above) is evaluated numerically and compared to null thresholding, for several values of m 
and (. We show that the excess risk of the FDR is much smaller than the one of null 
thresholding for a remarkably wide range of values for /3 and several m. This illustrates the 
adaptation of FDR procedure w.r.t. the unknown sparsity regime. Also, for comparison, we 
show that choosing am fixed with m (say, am = 0.05) can lead to higher FDR thresholding 
excess risk for some values of m. 

Finally, let us note that while our assumptions will exclude the case (° = 1 in the location 
model, our methodology can be adapted to some extent to this particular case, see Section 6.4. 

1.6 Relation to previous work 

First, Theorem 5.3 of [ ] showed in the Gaussian scale model that FDR thresholding is 
asymptotically optimal, i.e., pm = o{l) in (4). They also found the sufficient condition 
am and log am = o(logm) (which corroborates our condition in this particular model). 
Our results substantially extend this work to location and scale models using symmetric log- 
concave densities, such as Subbotin density (2). Additionally, we also provide finite sample 
results with an explicit convergence rate pm and a choice of am supported both by theory 
and numerical experiments. Another advantage of our approach is that our proofs appear 
substantially shorter and simplified. 

Second, in [ ] and [ ], for the Gaussian location model and the Laplace scale model, 
respectively, it is proved that FDR thresholding is asymptotically minimax for estimating 
the parameter of interest (which roughly correspond to pm and (7^, respectively) over specific 
sparsity classes (Theorem 1.1 in [ ] and Theorem 1.3 in ^]). We can legitimately ask whether 
such a property holds in our classification framework. It would correspond to the following 
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property: under (BP) and (Sp), for any /3o E (0, 1) 



with 



'm 




infimum is taken over the set of thresholds that can be written as measurable functions 
of the j9- values. As i?^ > Rm{t^) for any /3 G [/^o?!]? (4) implies that FDR thresholding 
satisfies (7), with an additional explicit rate p^. However, a more interesting (but possibly 
more challenging) task would be to show (7) in terms of relative excess risk, as discussed in 
Section 6.5. 

Third, the way the model parameters depend on m in our setting differs from [ ] and [1]. 
These studies investigate detection and estimation problems, respectively, in the Gaussian lo- 
cation model. Their setting corresponds to the case where Cm tends to zero. This assumption 
is not relevant in the present classification setting, because it entails that null thresholding 
is asymptotically optimal (see Remark 4.2 in Section 4). In the present paper, we therefore 
focus on sparsity regimes where Cm remains bounded away from 0, that is, in regimes where 
we can hope to improve substantially over null thresholding. 

Finally, let us emphasize that the classification setting described in Section 1.2 is connected 
to machine learning theory: namely, to Learning from Positive and Unlabeled Examples 
(LPUE) or Semi-Supervised Novelty Detection (SSND), see [ ]. In that paper, the distribution 
under the label "0" is unknown but we have at hand a large sample following this distribution 
("nominal" sample). The goal is to recover the labels from the nominal sample and the 
"contaminated" sample Xi, ...,X^. However, [x] uses the Neyman-Pearson criterion, not the 
mis-classification risk. 

2 General setting 
2.1 value model 

Let {pi,Hi) G [0, 1] X {0, 1}, 1 < i < m, be m i.i.d. variables. The distribution of {pi,Hi) is 
assumed to belong to a specific subset of distributions on [0, 1] x {0, 1}, which is defined as 



(i) the distribution of Hi is such that the (unknown) mixture parameter Tm = TVQ^m/^i,- 
satisfies Tm > 1, where 7ro,m ^{Hi 0) and ni^m ^{Hi = 1) = 1 — 7ro,m; 

(ii) the distribution of pi conditionally on i^i = is uniform on (0, 1); 

(iii) the distribution of pi conditionally on i^i = 1 has a c.d.f. Fm satisfying 



This way, we obtain a family of i.i.d. p- values, where each p- value has a marginal distribution 
following the mixture model: 



follows: 



Fm is continuous increasing on [0, 1] and differentiable on (0, 1), 
fm = F:^ is continuous decreasing with /m(O^) > Tm > /m(l~)- 



(A(F^,T, 



m 



)) 



U{0,1) + 



(8) 
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The model (8) is classical in the multiple testing hterature and is usually called the "two- 
groups mixture model". It has been widely used since its introduction by Efron et al. (2001) 
[12], see for instance [30, 18, 11]. 

The models presented in Section 1.2 are particular instances of this value model. In the 
scale model, we apply the standardization = 2D{\Xi\), which yields = 2D{D {t/2)/am)- 

We can check that if (f) satisfies (A((/))), then = 2D{D ^{t/2)/am) satisfies (A(F^, r^)), 

with/^(0+) = +OC and /^(1~) < 1, see Section 8.1. In the location model, we let = 
which yields = D{D ^{t) — iim). Here, (A(0)) is not sufficient to ensure that fm = F'^ 

is decreasing, e.g. in the Laplace location model where (j){u) — u ^ log 2, is only non- 
increasing. We will thus use the following additional assumption on (f) for the location case 

(j) satisfies (A(0)) and (j)' is increasing on with lim+oo 0' — +oo, (A'((/))) 

which ensures that = D{D ^{t) — /i^) satisfies (A(F^,r^)), with /m(O^) = +oc and 

/^(1~) = 0, as proved in Section 8.1. Finally, an illustration of the p-value model is given in 
Figure 2. 

Remark 2.1. Assuming that fm is decreasing is convenient to ensure that Bayes procedure 
is unique, with an explicit expression, see below. While it excludes some cases of potential 
interest such as the Laplace location case ((j)(u) = + log 2^^ it simplifies our approach. 
Furthermore, note that the Laplace location case is discussed separately in Section 6.4- 




Figure 2: Left: Gaussian location model in the Xj world; density of A/'(0, 1) (thick solid line); 
density of N{iJirnA) (solid line); X^, i — l,...,m (crosses); Bayes rule (dotted line). Right: 
Same model and observations in the p- value world: density of U (0, 1) (thick solid line); density 
fjyi (solid line); p^, i — 1, ...,m (crosses); Bayes rule (dotted line). The location parameter is 
jijn — 2 and the number of observations is m = 10. 



2.2 Procedures and risk 

A classification procedure is identified to a threshold tm G [0, 1], that is, a measurable function 
of the value family {pi^i G {1, m}) which chooses label 1 whenever the value is smaller 
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than tjn. The performance of tm is measured via the (integrated) misclassification risk, which 
is defined as fohows: 

Rm{im) IE(7ro,m^^m + T^l.mO^ Fm{im)))- (9) 

In the particular case of a deterministic threshold tm G [0, 1], we have Rm{tm) — ^o,mtm + 

{l-Fm{tm))- 

Remark 2.2. 1. Another classical choice for the risk is the averaged mis- classification 
probability of tm over the unlabeled sample itself: 

/ m m 

Rmiim) = E f 5^ Ife < = 0} + 771'^ Ife > H, = 1} 

^ 1=1 1=1 

m m 

= ^ P(p, < H, = 0) + J2 ^(P^ > H,^l). (10) 

i=i i=i 

As a matter of fact, our results also hold for this risk, as discussed in Section 6.1. 

2. Our methodology can also be easily extended to the weighted mis- classification risk, as 
discussed in Section 6.2. However, we have chosen to present the non-weighted case for 
clarity. 



2.3 Bayes procedure 

An optimal thresholding is defined as any tm satisfying 

Rm{im) = min{i?^(4)}, (11) 

P 

where the minimum is taken over all measurable functions from [0, 1]^ to [0, 1] that take 
as input the p-value family {pi^i E {l,...,m}). By the concavity of F^, any procedure tm 
has a risk greater than its expected value, that is, Rm{tm) ^ Rm{^{tm))- As a consequence, 
the minimum in (11) can be taken only over the deterministic threshold G [0, 1], that is, 
min£/ {Rm{tm)} equal to min^/^^[o,i]{^m(^m)}- Assuming (A(F^, r^)), the latter optimiza- 
tion problem has a unique solution. 

Lemma 2.3. Under Assumption {A{Fm^Tm)), the minimum of t ^ [0, 1] i-^ Rm{t) exists, is 
unique and is given by 

C = f-Hr^)eiO,l). (12) 

The threshold t^ is called Bayes threshold and Rm{t^) is called Bayes risk. Bayes thresh- 
old is unknown because it depends on Tm and on the data distribution fm- 

2.4 Assumptions on Bayes power and Sparsity 

Under Assumption (A(F^, r^)), let us denote the power of Bayes procedure by 

= e (0,1). (13) 
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In our setting, we will typically assume that the signal is sparse while the power Cm of Bayes 
procedure remains away from or 1: 

3(C_, C+) s.t. Vm > 2, < C_ < < C+ < 1; (BP) 
{^m)m is such that Tm ^ +OC as m ^ +oc. (Sp) 

First note that Assumption (Sp) is very weak: it is required as soon as we assume some 
sparsity in the data. As a typical instance, = satisfies (Sp), for any /3 > 0. Next, 
Assumption (BP) means that the best procedure is able to detect a "moderate" amount of 
signal. In [ ], a slightly stronger assumption has been introduced: 

3C G (0, 1) s.t. Cm ^ C diS m tends to infinity, (VD) 

which is referred to as "the verge of detectability" ^ Condition (BP) encompasses (VD) while 
(BP) is more suitable than (VD) to obtain non-asymptotic results. Hence, (BP) will be used 
throughout the paper. 

In the particular case of location and scale models, (BP) is equivalent to "i? (t^) — fim 
is bounded" in the location model and to "I? ^ {t^/2)/am is bounded away from and oc" 
in the scale model, respectively. Moreover, while the original parameters of the model are 
{Omi^m) (for Om — l^m or (7^), the modcl can be parametrized in function of {Cmi^m) by 
using (12) and (13). Interestingly, the c.d.f. Fm has the following interpretation w.r.t. the 
parameters (C^, Tm)'- among the family of curves {D{D (•) — /x)}^^]^ in the location model 
(or {2D{D '^(•/2)/(j)}cr>i in the scale model), F^(-) is the unique curve such that the pre- 
image of Cm has a tangent of slope r^, that is, fm{F^^{Cm)) = ^m- This is illustrated in 
Figure 3 for the Laplace scale model. In this case, D{x) = d{x) = e~^/2 for x > and thus 
Fm{t) = t^/^rn^ gQ ^j^g^^ ^j^g family of curves is simply {t t^^^}cr>i- 



2.5 pFDR thresholding 

In this section, we introduce pFDR thresholding, which can be seen as a theoretical (oracle) 
substitute for FDR thresholding. The pFDR will be useful in our analysis because it is much 
easier to study than the FDR. As introduced by [ ], the positive false discovery rate is defined 
as 

pFDR„(t) = = FiH, = I < t), 

for any t G (0, 1) and Gm{t) = no^mt + (1 - 7To,m)Fm{t). Under Assumption (A(F^, r^)), the 
function : t G (0, 1) i-^ Fm{t)/t is decreasing from /m(O^) to 1, with /m(O^) G (1,+oc]. 
Hence, pFDR^(-) is increasing from {1 + fm{0^)/Tm)~^ to 7ro,m and the following result holds. 

Lemma 2.4. Assume (A(F^,Tm)) and am ^ ((1 + /m(0^)/Tm)~''^, 7ro,m)- Then the equation 
pFDR^{t) = am has a unique solution t = t~^{am) ^ (0, 1); given by 

= 'i'm (QruTm) . (14) 

for = a;;,^ - 1 > and ^^m(^) Fm{t)/t. 

^However, we emphasize that (VD) does not refer to the so-called "detection" problem, as investigated in 
[9, 20] for instance. 
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Figure 3: Left: plot of the family of curves {t ^''^^^^^■^^^^}j=o,...,56 (thin solid curves). Right: 
choice (thick solid curve) within the family of curves {t ^ ti/^}a>i that fulfills (12) and (13) 
for Cm = 1/2 (given by the dashed horizontal line) and = 2 (slope of the dashed oblique 
line). This gives am — 4. Bayes threshold is given by the dotted vertical line. 



The threshold t~^{am) is called the pFDR threshold at level am- The pFDR threshold is 
unknown because it depends on Tm and on the distribution of the data. However, its interest 
lies in that it is close to the FDR threshold which is observable, as we will see in Section 3. 
For short, t~^(am) will be denoted by when not ambiguous. 

Let us discuss the condition am ^ ((1 + fm{0^) /^m)~^ ^^o,m) in Lemma 2.4. First, as 
^o,m > 1/2 (because > 1) and am will be taken smaller than 1/2 in the sequel, we 
will always have am < 7^o,m- Second, when /m(O^) = +oc (for a fixed m), we have (1 + 
fm{0^)/^m)~^ — 0. This case corresponds to the so-called "non-critical" case, see [ ]. It is 
satisfied in the location and scale models considered in Section 4. 

Next, Lemma 2.4 shows that the quantity Qm = — 1 > is a quantity of interest. As 
am = (1 + ^m)""*^? considering am or Qm is equivalent. 

Definition 2.5. For each am ^ (0, 1); the corresponding quantity qm = — 1 > is called 
the recovery parameter (associated to am)- 

Since we would hke to have = "^^{qmTm) close to t^ = fm^{^m)i the recovery 
parameter can be interpreted has a correction factor that cancels the difference between 
^m(^) = Fm{t)/t and fm{t) = ^m(^)- the sequel, we will always consider qm ^ ^ (that 
is, am < 1/2), because choosing qm ^ < ^ {oi equivalently am > o^- > 1/2) is always 
sub-optimal, see Appendix B. Clearly, the best choice for the recovery parameter is such that 
^171") that is, 

q"j:'-r-'^n^{f-\r^))^^, (15) 

which is an unknown quantity, called the optimal recovery parameter. Note that from the 
concavity of F^, we have ^^rn(^) ^ fm{t) and thus qm^ > 1. As an illustration, for the Laplace 
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scale model, we have crm/m(^) — ^m(^) cind thus the optimal recovery parameter is — ^m- 
This is represented in the left panel of Figure 4; the pFDR threshold for = 1 is the point 
t where the line between (0,0) and {t,Fm{t)) is parallel to the tangent of Fm at t^. In the 
right panel of Figure 4, the same representation is given for Gm{t) — 7ro,m^+ (1 — ^o,m)Fm{t). 
Hence, the slopes are transformed via u 7ro,m + (1 ~ ^o,m)u. Note that, by definition, the 
pFDR threshold at level am is such that Gm{t)/t — TTo^m/c^m, or, equivalently, Gm{t)/t = 

{qm + 1). 




Figure 4: Recovering Bayes risk with pFDR in Laplace scale model. Left: plot of (thick 
solid line); Cm = 1/2 (F-coordinate of the horizontal solid line) and = 2 (slope of the 
oblique solid straight line). Right: plot of Gm{t) — ^o,m^ + (1 ~ ^o,m)Fm{t) (thick solid 
line); Gm{t^) = 2t^/3 + is given by the F-coordinate of the horizontal solid line 

and 27ro,m = 4/3 is the slope of the oblique solid straight line. On both pictures, pFDR 
thresholding is represented for qm — ^ (i.e. am — 1/2) (dotted) and for the optimal recovery 
parameter qm — cTm — 4 (i.e. am — 1/5) (dashed). 



2.6 FDR thresholding 

The FDR threshold has been introduced in [ ] by Benjamini and Hochberg. As noted later 
on by may authors (see, e.g., [17, 21]), it can be expressed as a function of the empirical c.d.f. 
Gm of the p-values in the following way. For any am ^ (0, 1) let us define 

im^M = max{t E [0, 1] : Gm{t) > t/am}- (16) 

We simply denote [am) by when not ambiguous. Classically, this implies that t — 
solves the equation Gmif) — t/o^m (this can be easily shown by using (16) together with the 
fact that Gm{') is a non-decreasing function). Hence, according to Lemma 2.4, can be 
seen as an empirical substitute of the pFDR threshold at level «rn^o,m5 in which the theoretical 
— ^o.mt ~^ ^i,mFm{t) of the p-valucs has been replaced by the empirical c.d.f. Gm 
of the p- values. 
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Next, we would like to make the following important points about i^^: first, (16) only 
involves observable quantities (once has been chosen), so that the threshold only 
depends on the data. This is further illustrated on the left panel of Figure 5. Second, let 
us recall the original (equivalent) definition of [ ], which makes very simple to use in 
practice: considering the order statistics = p(o) ^ P{i) ^ ••• ^ P{m) of ^he value family, we 
can write — amk^^ /m, where — maxj/c G {0, 1, ...,m} : p(^j.^ < amk/m}. Third, 
for technical reasons, we chose to modify the value of in the special case where = 0. 
When = 0, we simply replace the threshold by the so-called Bonferroni threshold am/m. 

Definition 2.6. The FDR threshold at level am is defined by 

i™ = il''v(a^/m), (17) 

where is defined by (16). 

This modification allows to deal with the "hypersparse" case oc m, as we will see later 
on. The threshold is the one that we use throughout this paper. However, note that 

we do not need to perform this modification when considering the risk i?^ defined in (10) 
instead of Rm, see discussion in Section 6.1. 

Finally, we easily check that (17) and Algorithm 1.1 lead to the same classification proce- 
dure in the special case where the values come from a location model (obviously, the same 
holds for a scale model). 

FDR threshold FDR, pFDR and Bayes thresholds 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



Figure 5: Left: illustration of the FDR threshold (17): e.c.d.f. of the p- value (solid line), 
line of slope l/o^m (dotted line), FDR threshold at level am (X-coordinate of the vertical 
dashed dotted hne). Right: illustration of the FDR threshold as an empirical surrogate for 
the pFDR threshold; compared to the left picture, we added the pFDR threshold at level 
c^m^o,m (dotted vertical line) and Bayes threshold (dashed vertical line). In both panels, we 
consider the Laplace scale model with Cm — 0.5; m = 50; /3 = 0.2; Tm — ^n^] o^m — 0.4. 
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3 General results 

Choosing instead of in pFDR thresholding induces some excess risk. Our first main re- 
sult aims at quantifying the latter. Remember that a threshold tm is said to be asymptotically 
optimal if Rmiim) ^ Rm{t^) as m tends to infinity. 

Theorem 3.1. Assume (A(F^,Tm)) and consider the pFDR threshold t~^ at a level am G 
((1 + fm{0^) /^m)~^ , TTo^rn) Corresponding to a recovery parameter qm = — 1- Consider 
qm^ > 1 the optimal recovery parameter given by (15). Then the following holds: 

(i) if < 1/2, we have for any m > 2, 

Rmit'J - Rm{t^) < 7ri^m{{Cm/qm " C^/C') V 7m}, (18) 

where we let 7^ — {Cm — Fm{'if^{qm^m)))+' In particular, under (BF), if am 
and 7m 0, the pFDR threshold t~^ is asymptotically optimal at rate am + 7m- 

(a) we have for any m > 2, 

^ (1 - (1 - i^')^Frn{q;n'r-')) - (19) 

^mV'm) ^mV'm) 

In particular, under (BP)^ if Rmit^) (1 - Cm) and if 

m [ I - Cm J 

is not asymptotically optimal. 

Theorem 3.1 is proved in Section 7. Assumption am < 1/2 in Theorem 3.1 (i) allows to 
get Cm/Qm instead of l/qm in the RHS of (18). This assumption is not restrictive because 
choosing am > a- > 1/2 never leads to an asymptotically optimal procedure, as proved in 
Appendix B. Also note that the RHS of (18) is equal to zero when qm = which shows 
that this bound is sharp in this case. 

The bound (18) induces the following trade-off for choosing am' on the one hand, am 
has to be chosen small enough to make Cm/Qm small; on the other hand, 7^ increases as am 
decreases to zero. The lower bound (19) is useful to identify regimes of am that do not lead 
to an asymptotically optimal pFDR thresholding. 

Next, we provide our second main result, which deals with FDR thresholding. 

Theorem 3.2. Let e E (0, 1); assume (A(F^,r^)) and consider the FDR threshold at 
level am > (1 — ^)~''^(vro,m + ^i,mfm{^^))~^ - Then the following holds: for any m > 2, 

j:FDR\ d {+B\ ^ ^ . O^m 



1 - «m (1 - «m)^ 

+ {7;. A (7^ + e-^^(--+^)-^(^--^^) } , (21) 

for 7m (Cm - Fm{'^m{Qm^m)))+ ^ith q^ («m7ro,m(l 1 and 7^ {Cm 

Fm{(^m/T^)) + ' In particular, under (BP) and assuming am 0, 

(i) ifTm/m = 0{1), 7^ ^ and Va>: > 0, e"^^/^- = 0(7^), the FDR threshold i^^^ is 
asymptotically optimal at rate am + 7^- 
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(ii) ifm/Tjn ^ ^ G (0, +oc) with 7^ 0^ the FDR threshold i^^^ is asymptotically optimal 
at rate a^n + 7^ • 

Theorem 3.2 is proved in Section 7. The proof mainly follows the methodology of [ ], but 
is more general and concise. The main argument for the proof is that the FDR threshold 
i^^{ajn) is either well concentrated around the pFDR threshold tj^(am7ro,m) (as illustrated 
in the right panel of Figure 5) or close to the Bonferroni threshold a^/m. 

Let us comment briefly on Theorem 3.2: first, as in the pFDR case, choosing am such 
that the bound in (29) is minimal involves a tradeoff because 7^ and 7^ are quantities that 
increase when olui decreases to zero. Second, let us note that items (i) and (ii) in Theorem 3.2 
are intended to cover regimes where = with /3 G (0, 1) (in which FDR is close to 
pFDR) and the "hyper-sparse" regime where Tm — m (in which the FDR threshold is close to 
the Bonferroni threshold), respectively. Finally, the bounds and convergence rates derived in 
Theorems 3.1 and 3.2 strongly depend on the nature of Fj^. We provide in the next section 
a more explicit expression of the latter in the particular cases of location and scale models. 

Remark 3.3 (Conservative upper-bound for 7m)« By concavity of Fj^, we have qm^m — 

^mifm) ^ fmifm)^ ^^ich prOVidcS 

lm<Cm- Fm{f-\qmTm)) ^ [0, 1). (22) 

When is easier to use than "i^^, it is tempting to use relation (22) to upper bound the 
excess risk in Theorems 3.1 and 3.2. However, this can inflate too much the resulting upper- 
hound, as we will discuss in Section 6.3 for the case of a Gaussian density (for which this 
results in an additional log log factor in the hound). 

4 Application to location and scale models 
4.1 Bayes risk and optimal recovery parameter 

A preliminary task is to study the behavior of t^, Rm{t^) and — Cm/ {rmt^) both in 
location and scale models. Although finite sample inequalities are given in Section 8.2, we 
only report here some resulting asymptotic relations for short. Let us define the following 
rates, which will be useful throughout the paper: 

r^^ = o r'(logr^ + <P{\D~\C^)\)) (23) 
= (Id X ,/.') o r^{\ogr^ + cj){D-\Cm/2))), (24) 

where Id denotes the identity function, hence, (Id x (t)^){x) — x(j)\x). Under (Sp), we easily 
check that the rates r^*^ (resp., r^) tend to infinity, given that (p satisfies (A'(0)) (resp., 
(A((/)))). Table 1 provides some useful calculations for (j) in the case where it comes from a 
("-Subbotin density. In that case, we easily derive rl^" = (Clogr^ + \D ((7^)19^"^^'' and 
r- = Clogr^ + (:D-'(C^/2))C. 

Proposition 4.1. Consider d{x) = e^^^'^'^ for a function (j) satisfying (A((/))) in the scale 
model or (A'((/))) in the location model. Let (rm^Cm) ^ (l^co) x (0,1) he the parameters of 



15 



d{x) 




Lc 


J —CO 


d(0) 




m 


wVC + logic 






(p'o(p-^{v) 


(Ct;-Ciogi^c)'"'/^ 


(t)~^{v) X (f)' o (f)~^{v) 


C^;-Clog I^c 


4>"{u) 


(C-l)«^-2 


<t>"{u)/{<t>'{u)f 





Table 1: Notation and some useful calculations for the ^-Subbotin density. 



the model. Let Tm > be equal to r^^^ defined by (23) in the location model or to defined 
by (24) in the scale model Then, under (BP) and {Sp), we have 

C = {RmiC)/r^) (25) 
RmiC) ~ 7ri,^(l - Cm). (26) 

Furthermore, for a (-Subbotin density (2), 



Crr 



-{(logTm)^ for the location model, C > 1 



qopt ^ J d(D {Cm)) ^ j^27) 

^ 1 =^1 "^—^ C loe Trri for the scale model, C>1 

From (25) and (26), the probability of a type I error 7ro,m^m always of smaller order 
than the probability of a type II error 7ri^^(l — C^), under (BP) and (Sp). The latter has 
already been observed in [ ] in the particular case of a Gaussian scale model. Next, for a 
("-Subbotin density and = ^ < /3 < 1, (27) gives rise to the choices aJ^'^(/3o, Co) and 
c^ml/^o^Co), defined by (5) and (6), respectively, which are described in the introduction of 
the paper. 

Remark 4.2. From (26) and since the risk of null thresholding is Rm(0) = tti^^^ a substantial 
improvement over the null threshold can only be expected in the regime where Cm ^ C-, where 
C- is 'far^^ from 0. 



4.2 Finite sample oracle inequalities 

The following result can be derived from Theorem 3.1 (i) and Theorem 3.2. Itis proved in 
Section 8.3. 

Corollary 4.3. Consider d{x) — e^^^'^l^ for a function (j) satisfying (A(0)) in the scale model 
or (A'((/))) in the location model. Let {rm^ Cm) ^ (1, oc) x (0, 1) be the parameters of the model. 
Let > and Km > be defined as follows: 

• in the location model, rm — T^m d^fi'^^d by (23) and Km — ^(0); 

• in the scale model, rm — ^m defined by (24) and Km = 2D ^ {Cm/'^)d{D "^"((7^/2)). 
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Let am E (0, 1/2) and denote the corresponding recovery parameter by Qm = — Consider 
Qm^ ^ 1 the optimal recovery parameter given by (15). Let v G (0, 1). Then: 

(i) The pFDR threshold t~^ at level am defined by (14) satisfies that for any m > 2 such 
that rm > c;^ft^{log{qm/qm^) - logz/), 

KAt'J - R^(C) < .1,™ { - 1^) V j J ; (28) 

(a) Let e G (0,1), Di^m = - log(i^7ro,m(l - ^)) and D2^m = log(z/~^C^ r^^m). Then the 
FDR threshold at level am defined by (17) satisfies that, for any a G {1,2}, for 

any m>2 such that rm > ^r^^^(log(a;;,V^m^) + ^a,m); 

o {fFDR^ o (+B^ ^ ^ { . ^ {logja-^ / q?^^) + Da^m)+ \ 

Rm\tm ) - LCm[tm) < ^l,m K 

\l- am Tm J 

+ 7^^^^ + ^i,ml{a = l}e— (29) 
(1 - amY 

Corollary 4.3 (ii) contains two distinct cases. The case a = 1 should be used when m/rm 
is large, because the remaining term containing the exponential becomes small (whereas Di^m 
is approximately constant). The case a = 2 is intended to deal with the regime where m/rm 
is not large, because i?2,m is of the order of a constant in that case. In any case. Km is 
approximately constant with m under (BP). For instance, we can choose e — v — 1/2 io use 

(28) and (29). 

The form of our finite sample oracle inequalities (28) and (29) is useful to derive explicit 
rates of convergence, as we will see in the next section. Moreover, let us mention that (28) and 

(29) can be used to investigate the issue of choosing am for pFDR/FDR thresholding, simply 
by minimizing these upper-bounds in am-, after having removing the negligible remaining 
terms. However, since the resulting minimum is likely to depend on some artifacts coming 
from the proofs (constants for instance), we prefer to use the choice induced by qm^ described 
in Section 4.1. Let us finally mention that an exact computation of the excess risk of pFDR 
thresholding can be derived in the Laplace case, see Section 4.4. 

4.3 Optimality with rates 

Let us recall that a threshold tm is said to be optimal at rate pm — o{l) ii there exists some 
constant D > {) such that for all m > 2, 

Rm{im) - RmiC) < D Pm Rm{C). (30) 

and is said asymptotically optimal if Rm{im) ^ Rm{tm)' Under (BP) and (Sp), Corollary 4.3 
shows that such a result holds for pFDR/FDR thresholding, with an explicit pm- Furthermore, 
using Theorem 3.1 (ii), we can establish a necessary and sufficient condition on am for which 
pFDR thresholding is asymptotically optimal. For this, we should introduce the following 
additional assumption on 0: 

(j) satisfies (A(0)), (j) is on with (\)" l{(\)'Y non-increasing on (0, oc). (^(0)) 
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We will also consider the following assumption, either for ip — (f)' o (f) ^ (location) ot ip — 
(Id X 00 0-1 (scale): 

'0(x + o{x)) ^ '0(x) and '^(x) = 0{x) as x ^ +oc. (C('0)) 

Note that assuming (BP) and (Sp), we have ^ '^(logT^) under (C('0)), either for = 
and ijj — (f)' o(f)~^ or for = and ip — (Idx^^o^--*^. Also, from Table 1, when considering a 
("-Subbotin density. Assumptions {B{(j))) and (C('0)) with "p — p' o(j)~^ and "p — (Id x (j)')o(j)~^ 
are all fulfilled. 

Corollary 4.4. Consider d{x) = e^^^'^l^ /or a function p satisfying (A((/))) in t/ie 5ca/e model 
or (A'(0)) m t/ie location model. Let {rm^ Cm) G (1, oc) x (0, 1) he the parameters of the model. 
Let > and '0(-) he defined as follows: 

• in the location model, rm — defined hy (23) and ^p — p' o ; 

• in the scale model, rm — defined hy (24) and "p — {Id x p') o . 

Assume that (BP) and (Sp) hold. Consider the pFDR threshold at a level am ^ (O?!)- 
Consider qm^ > 1, the optimal recovery parameter given hy (15). Then the following holds: 

(i) The pFDR threshold t~^ is asymptotically optimal if 

am^O and log am = o {rm) , (31) 

in which case it is optimal at rate pm — Q^m + (}^z{^^ / Qm^)) / ^m- Additionally, if p 
satisfies {B{p)) and'p satisfies {C{'p)), the pFDR threshold t~!^ is asymptotically optimal 
if and only if (31) holds. 

(a) Further assume that there exists A > such that ^p{x) — 0(e^^) for x +oc and that 
the sparsity regime Tm satisfies 

m/Tm > (logTm)^^^ for some 9 > 0; or m/rm ^ G (0, +oc). (32) 

Then, the FDR threshold at a level am satisfying (31) is optimal at rate pm — 

am + {\og{a:;^ /q^^))/rm- 

Let us first note that the two regimes described in (32) are the same as those proposed 
in [6]. They cover all possible sparse scenarios when Tm — with /3 G (0,1]. Next, to 
illustrate Corollary 4.4, let us consider the case of a ("-Subbotin density under the sparsity 
regime Tm — for a fixed /3 in (0, 1]. In this case, the optimality condition (31) has a more 
explicit expression, see Table 2. Corollary 4.4 implies that this condition is necessary and 
sufficient for pFDR optimality, and sufficient for FDR optimality. Furthermore, it implies 
that convergence rate of the relative excess risk is pm — o^m + ^^^(logmjT" ^ with 7 = 1 — (^"-^ 
(resp., 7 = 1) for the location (resp., scale) case. According to the order of magnitude of 
qm^ (see Table 2), this proves that choosing qm oc (logm)^ yields an optimal pFDR/FDR 
thresholding at rate pm = l/(logm)^. For instance, the latter holds for q;^'^(/3o, Co) and 
a^(/3o, Co) defined by (5) in the location case and by (6) in the scale case, respectively. 

We can legitimately ask whether the rate pm — l/(logm)^ can be improved. We show in 
the next section that this rate is the smallest that we can obtain over a non-trivial sparsity 
class, for pFDR thresholding in the Laplace scale model. 
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Model 


("-Subbotin location, C > I 


(^-Subbotin scale, C ^ 1 


Fm{t) 

oparsiiy Tj^ 
Parameter 
Tm in (23) or (24) 


D{D-\t)-iim) 

l^m ~ (,CP iog m) 1 
rJ;j^~(C/31ogm)i-VC 


2:D(:D"'(t/2)/<7^) 

777/ 

^ C/3 log m 


Bayes threshold 






q^^ in (15) 




_ , _-/3 2D"'(C^/2)(i(D"'(C^/2)) 


(C/31ogm)i-i/C 


-m-/^(l-C^) 

- — 1 ^"^--1 C/Slogm 


FDR/pFDR threshold 






Optimality condition (31) 
Rate pm in (30) 
for oc q^^ 


^ OJoga^ = o((logm)^"^/^) 
l/(logm)i-i/^ 


^ 0, log ttm = o (log m) 
1 / (log m) 



Table 2: Summary of our results for a ("-Subbotin density in the sparsity regime = m^, 
< /3 < 1 and under (BP). 



4.4 Case of a Laplace scale model and lower bound 

In the Laplace scale model, it turns out that ^^{^ is an explicit function (^^^(t) = Fjyi{t)/t — 
fCT- -1)^ SO that we can investigate exact calculations for the pFDR threshold. This is useful 
to estabhsh lower bounds on the excess risk of the pFDR threshold and to get a more accurate 
upper-bound for the FDR threshold. 

Proposition 4.5. Consider the Laplace case (j){x) = x + log 2 and the corresponding scale 
model with parameters (r^, Cm) ^ (1, oc) x (0, 1). Let am G (0, 1/2) and Qm = — 1 be the 
corresponding recovery parameter. 

(i) Let ^ : X G M 1-^ + x — 1 G M^. Then the pFDR threshold at level am satisfies 
that for any m > 2, 

Rm{t^m) ^m{tm) Cm^l.m ( — — ^^^^^ + Sm^ , (33) 

V cr^ J 

for the remaining term 6m = 9 (^^^fe^) {Qm' " 1) + "^^^^(^^^n' " Q;n')' 

(a) Let £ E (0,1); Di^m — — log(7ro,m(l — ^)) ci'^d i?2,m — log(m/T^). Then the FDR 
threshold t^^ at level am satisfies that for any a G {1,2}; for any m > 2, 

Rmiim ) ~ -^m{tm) 

. [ , (log(a;;,V^m) +£>a,m)+\ , O^m/m 
S ^l,m \- Cm + 



I - am CTm-'^ J [1 - a, 



m J 

m ) 



Un n.^r./ ^g^gm (1 - (log(a^Va^) + Dl,mj^j^ ^ .... 

+ 7ri,^l|a= l|exp<^ -— — — ^. (34) 

Proposition 4.5 is proved in Section 8.5. Expression (33) results from direct calculations 
while inequality (34) relies on Theorem 3.2. As we consider the Laplace scale model, we can 
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easily check that the optimal recovery parameter is a^, that is, we have "^^{(JjnTm) — tm- 
Expression (33) gives the excess risk when choosing instead of cr^ as recovery parameter 
in the pFDR threshold, which is proved to strongly depend on the behavior of g{-). Next, 
inequality (34) can be seen as an improvement over (29) in the special case of a Laplace 
scale model: while Km/r^ is of the same order as Cml^m in that case (because K^^ — 
Cmlog(l/Cm) and Gm ^ logT^/(log(l/Cm)), by using Table 2 for C = 1), the remaining 
terms are of smaller order in (34) and inequality (34) is true for any m > 2. 
Furthermore, expression (33) entails the following lower bound. 

Corollary 4.6. Consider the Laplace scale model satisfying assumption (BP) and (Sp). Then 
for any ^ (0, 1) with recovery parameter Qm = — 1, we have 

Rmifm) - Rm{t^) o (i?^(t^)/(log T^)) if and only if qm ^ (Jm- (35) 

In particular, for the sparsity regimes Tm = , 13 ^ for any subset B of {0^ 1] containing 
more than two elements, we have for any sequence {am)m with am ^ (0,1) (that does not 
depend of /3 ), 

hminf ((logrn) sup f ^"^'^ "f^^'^^ ) | > 0. (36) 

Corollary 4.6 is proved in Section 8.5. For the sparsity regimes Tm — m^, the equivalence in 
(35) shows that the only way to obtain a relative excess risk of order smaller than (logm)"-*^ 
is to take qm ^ /31ogm/(log(l/Cm)). This choice is not possible when /3 can take several 
values. This gives rise to the formulation (36). As a consequence, the rate obtained in 
Corollary 4.4 (itself coming from Corollary 4.3) may not be improved for pFDR thresholding 
in the particular case of a Laplace scale model. 

While the calculations become significantly more difficult in the other models, we believe 
that the minimal rate for the relative excess risk of the pFDR is still (logm)~^ for a Q- 
Subbotin density, with 7 = 1 — (resp. 7=1) for the location (resp. scale) case. Also, 
since the FDR can be seen as a stochastic variation around the pFDR, we believe that this 
rate is also minimal in the case of the FDR, see also the discussion in Section 6.5. 



5 Numerical experiments 

In order to complement the convergence results stated above, it is of interest to study the 
behavior of FDR and pFDR thresholding for a small or moderate m. 



5.1 Exact formula and upper-bound for the FDR risk 

The pFDR threshold can be approximated numerically, which allows us to compute 
Rm{tm)- Computing Rm{t^^) is more complicated, because the FDR threshold is 
not deterministic. However, we can avoid performing cumbersome and somewhat imprecise 
simulations to compute Rm{i^^)^ by using the approach proposed in [15] and [25]. Using 
this methodology, the full distribution of may be written as a function of the c.d.f. of the 
order statistics of i.i.d. uniform variables. Let for any A: > and for any (ti, ...^tk) G [0, 1]^, 

*fc(ti,...,tfc) =P(t/(i) <ti,...,t/(fc) <tfc). 
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where {Ui)i<i<k is a sequence of i.i.d. uniform variables on (0,1) and with the convention 
^^o(') — 1- The "i^k S can be evaluated e.g. by using Steck's recursion (see [ ], pages 366-369). 
Then, relation (10) in [" ] entails 

X *m-fc(l - Gm(am/m), 1 - Gm{a{k + l)/m)), (37) 

where Gm(t) = vro^m^ + ^i,mFm{t)- For reasonably large m (m < 10,000 in what follows), 
expression (37) can be used for computing the exact risk of FDR thresholding t^^^ in our 
experiment. 

For larger m, e.g., m = 10^, we did not undertake exact FDR risk calculations, because 
evaluating ^^y^, k G {l,...m} was not feasible in practice, for two reasons. First, available 
algorithms are quadratic in m. Second, this calculation involved the summation of very large 
numbers of very small terms, making the numerical accuracy of the result questionable for 
very large m. Nevertheless, as "i/kitii •••itk) < "i/kitki •••itk) — {tk)^i we propose to replace 
(37) by the following upper-bound for the risk: 

^m(t™) <E (T)^m ("^i^Xl)") Gm{ak/m)\l - G^(a(fc + l)/m))— ^ (38) 

This upper bound can be calculated quickly and with great numerical accuracy even for large 
m (e.g., m — 10^). 



5.2 Choosing 

By using (15) in Section 2.5, we propose to choose cy,jY^ as follows: 

<*(/3o, Co) = (1 + g:;^*(/3o, Co))-' with (/3o, Co) = m-P^Co/F-l{Co), (39) 

where F^,o is the c.d.f. of the p-values following the alternative for the model parameters 
(/^cCo). For instance, 

F^Iq(Co) = $ {^'^{C^f + 2/3o logmj^^^^ ; (Gaussian location) 

F-j3(Co) = 2$ (^$"\Co/2)x) , (Gaussian scale) 

with X > 1 solving 2/3o logm + 21ogx = {^~^ {Cq/2)Y{x^ - 1); 
Qm^iPoi Co) ^ y > I solves /3q logm + logy = (y — 1) log(l/Co), (Laplace scale) 

where ^{z) denotes F{Z > z) for Z ^ A/'(0, 1). From Proposition 4.1, the choice a^^(/3o, Co) 
defined by (39) is asymptotically equivalent to the exphcit choice a^(/3o,C'o) given by (5) 
and (6) in the introduction of the paper. Numerical comparisons between the pFDR and 
FDR risks obtained according to a^^(/3o, Co) and a^(/3o, Co) are provided in Section 2 of the 
supplementary material [ ]. While q:^(/3o,Co) qualitatively leads to the same results when 
m is large (say, m > 1, 000), a^*(/3o, Co) is more accurate for a small m. 

Finally, note that the choices Q{^*(/3o,Co) and a^(/3o,Co) are motivated by the analysis 
of the pFDR risk, not that of the FDR risk. Hence, it might be possible to choose a better 
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am for FDR, especially for small values of m for which pFDR and FDR are different. Because 
obtaining such a refinement appeared quite challenging, and as our proposed choice already 
performed well, we decided not to investigate this question further. 

5.3 Adapting to unknown sparsity 

In order to make our experiments comparable across parameter values, we quantify the quality 
of a thresholding procedure based on the ratio between the excess risk of this procedure to 
the excess risk of null thresholding: 

The excess risk ratio ERR^(t^) defined by (40) is the baseline for all our experiments. The 
closer it is to 0, the better the corresponding classification procedure is. Figure 6 compares 
excess risk ratios of different procedures in the Gaussian location model: Bayes procedure 
with parameters (/3 = ^o,Cm = Cq) (that is, F~q(Co)), pFDR and FDR thresholding at 
level q: for q: E {0.05, 0.1, 0.2, q:^^(/3o, Co)}. We study the behavior of the excess risk ratio 
as the (unknown) true model parameters (/3, Cm) vary in [0, 1] x [0, 1], and arbitrarily choose 
/3o and Cq as the midpoints of the corresponding intervals, i.e. /3o = 1/2 and Cq = 1/2. 
Colors reflect the value of the excess risk ratio. They range from white (low risk) to dark red 
(higher risk). Black lines represent the level set ERR = 0.2, that is, they delineate a region 
of the (/3, Cm) plane in which the excess risk of the procedure under study is five times less 
than that of null thresholding. The number at the top left of each plot gives the fraction of 
configurations (/3, C) for which ERR < 0.2. Each column in Figure 6 corresponds to a value of 
m G {25, 100, 10^, 10^, 10^}. For m = 10^, we did not undertake exact FDR risk calculations, 
but used (38) to provide an upper bound on the FDR relative risk for m = 10^. We expect 
this bound to be conservative, and the corresponding plots are marked with (*). Also note 
that FDR risk is expected to be well approximated by pFDR risk for such a large value of m. 
This is confirmed by the fact that FDR and pFDR plots at a given level a are increasingly 
similar as m increases. 

Bayes thresholding (top line) performs well when the sparsity parameter /3 is correctly 
specified, and its performance is fairly robust to Cm- However, it performs poorly when /3 
is misspecified, and increasingly so as m increases. The results are markedly different the 
other thresholding methods. pFDR and FDR thresholding are less adaptive to Cm than 
Bayes thresholding, but much more adaptive to the sparsity parameter /3, as illustrated by 
the fact that the configurations with low ERR span the whole range of /?, especially when 

« = «m^(/3o, Co). 

Another striking point is that while pFDR thresholding with fixed values of a per- 
forms fairly well for some values of m, it is outperformed by pFDR thresholding when 
a = q:^^(/3o, Co). This is because this choice of a is calibrated as a function of m. The 
same remark holds for FDR thresholding. Importantly, pFDR and FDR thresholding using 
this calibration are increasingly adaptive to sparsity as m increases. This corroborates the 
results of Section 4.3 which entail that ERR^(t^) and ERR^(t™^) are 0((\ogm)-^/'^). 

Results for Laplace and Gaussian scale models are similar. The corresponding Figures 
are given in Section 2 of the supplementary material [ ] . Importantly, the range of values of 
(^m^i/^o^Co) differs substantially between models: from [0.17,0.27] in the Gaussian location 
model, to [0.05,0.12] in the Gaussian scale model and [0.06,0.15] in the Laplace scale model. 
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m = 25 m = 10^ m = 10^ m = 10^ m = 10^ 




0,0 0.2 0.4 0.6 0.8 1.0 



Figure 6: Adaptation to sparsity by (p)FDR thresholding in the Gaussian location model. 
Excess risk ratios ERR^ for various thresholding procedures (rows) and different values of 
m (columns). In each panel, the corresponding risk is plotted as a function of /3 G [0,1] 
(horizontal axis) and Cm G [0, 1] (vertical axis). Colors range from white (low risk) to dark 
red (high risk), as indicated by the color bar at the bottom. For FDR, panels with m = 10^ are 
marked with a star (^) in order to indicate that only an upper bound on ERR was calculated. 
Black lines represent the level set ERR = 0.2. The point (/3 = /3o, C = Co) is marked by "+". 
We chose l3o = l/2 and Cq = 1/2. 
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5.4 Influence of the choice of parameters /5o and Cq 

In figure 6, Bayes procedure and the optimal recovery parameters are calibrated using /3o = 
1/2 and Cq = 1/2. The above results show that pFDR and FDR thresholding are adaptive 
to the unknown sparsity, in the sense that when applied at level a^^(/3o, Co), they achieve a 
low excess risk ratio even when the true sparsity parameter /3 is not /Sq. 

In this section we discuss the influence of Cq on the performance of pFDR and FDR 
thresholding at level a^^(/3o, Co). Figure 7 gives the ERR for Bayes, pFDR and FDR thresh- 
olding for /3o = 1/2 and Cq G {1/4, 1/2, 3/4}. As expected, Bayes thresholding is quite robust 
to the choice of Cq, as it achieves low ERR for all values of Cq. However, as mentioned 
above, Bayes thresholding is quite sensitive to the specification of /3, and its performance 
when /3q is misspecified decreases rapidly as m increases. In contrast, pFDR thresholding at 
<^m^(/3o5 Cq) is more sensitive to the specification of Cm, and much less to the specification of 
(3. In particular, the region in the {(3, Cm) plane for which ERR < 0.2 are markedly different 
for Cq = 1/4, 1/2 or 3/4, especially for small values of m. As m increases, these low-ERR 
regions widen and their overlap increases, making pFDR thresholding less sensitive to the 
specification of Cm- FDR thresholding at Q^m^(/5o? Co) achieves a reasonably low ERR over 
the whole range of values for /3 and Cm- However, the region with low ERR is smaller for 
smaller values of Cq. We also observe that for a given value of Co, the region with low ERR 
gets bigger as m increases. We believe that this also holds for larger m, even if it cannot be 
deduced from the upper bound on FDR ERR that we calculated for m = 10^. 

Results for Laplace and Gaussian scale models are similar. The corresponding Figures are 
given in Section 2 of the supplementary material [2^] . 

6 Discussion 

6.1 Extension to the risk Rm 

Our bounds are established for the misclassification risk over a new labeled data (9) and not 
for the misclassification risk over the unlabeled sample Rm, defined by (10). Remember that 
these two risks are the same for a deterministic threshold (e.g., the pFDR threshold), but can 
be different for a random threshold. Hence Theorem 3.1 also holds for the risk Rm- We can 
legitimately ask whether this is the case for Theorem 3.2. 

As a matter of fact, we can prove that Theorem 3.2 is also true for the risk Rm] first, 
for this risk, the threshold defined by (17) has the same risk than the threshold 

defined by (16). This comes from the equality 

{l<i<m : p^< t™} = {l<i<m : p,< t^^}, 

which can be easily checked. Hence we can work directly with . Second, the bound for the 
type I error is the same as in (51) and can be proved similarly. Third, the proof for bounding 
the type II error derives essentially from the following argument, which is quite standard in 
the multiple testing methodology, see e.g. [13, 14, 25, 23]. Let us denote 

im max{t G [0, 1] : a^G^(t) > t}, 

where Gm{t) = m~^{l + X^^2 ^{Pi — 0) denotes the empirical c.d.f. of the values where pi 
has been replaced by 0. Then, for any realization of the p-value family, pi < is equivalent 
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m = 25 m = 10^ m = 10^ m = 10^ m = 10^ 



oooa 




0.0 0.2 0.4 0.6 0.8 1.0 

Figure 7: Excess risk ratios (ERR) of Bayes, pFDR and FDR thresholding for m G 

{25, 100, 10^, 10"^, 10*^}, ^0 = 1/2 and Co G {0.25,0.5,0.75}. In each panel, the point 
(/3 = /3o, C = Co) is marked by "+". 
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to Pi < im (see, e.g., Section 3.2 of [23]). This entails that the type II error is equal to 
7ri,m(l -IE(^m(^m))) (by using the exchangeability of (i?i,Pi)i<i<m)- Finally, since im > 
and im > am/m, we have im > t™. Hence ^i,^(l - E{Fm{im))) < ^i,m(l - E(F^(t™))) 
and the bounds (54) and (55) also hold for the risk Rm- 

In conclusion, all the results of Sections 3 and 4 are valid using the risk Rm instead of 

6.2 Extension to weighted mis-classification risk 

In our sparse setting, where we assume that there are many more labels "0" than labels "1", 
one could consider that mis-classifying a "0" is less important than mis-classifying a "1". This 
suggests to consider the following weighted risk: 

Rm,\m{^m) IE(7ro,m^m + A^7ri^^(l Fm(^m))), (41) 

for a known factor G (1,t^). In Section 1 of the supplementary material [22], we show 
that all our results can be adapted to this risk. Loosely, when considering Rm,Xm iiistead of 
Rmi our results hold after replacing Tm by Tm/^m Q^ud Qm by Qm^m^ 

see the supplementary 

material [ ] for precise statements. 

As an illustration, let us consider here the case of a ("-Subbotin density, Tm = P G 
(0,1], logA^ = o(logm), under the (corresponding) assumptions (BP) and (Sp). We show 
that the optimal recovery parameter satisfies Qm^Xm ^ (logm)^, where 7=1 — C~^ and 7=1 
for the location and scale cases, respectively. Furthermore, we show that taking Qm 
leads to the optimality rate pm = (logm)~^ for the relative excess risk based on Rm^x^- 
While the order of qm^ is not modified when oc 1, it may be substantially different when 
A^ oc. Typically, A^ oc (logm)^ leads to qm^ oc 1. Hence, when considering Rm,Xm instead 
of Rm^ the value of A^ should be carefully taken into account when choosing am to obtain a 
small excess risk. 

Finally, for the ("-Subbotin density, Tm = rn^ , ^ G (0,1] and logA^ = o((logm)^), we 
show in the supplementary material [22] that a sufficient condition for FDR thresholding 
to be asymptotically optimal for the risk Rm.Xm '^^ tdk.e q'^ — 0(1), qm^m co and 
logqm = o ((logm)^). This recovers Theorem 5.3 of [ ] when applied to the particular case of 
a Gaussian scale model (for which 7 = 1). 

6.3 Case of a Gaussian density 

Let us consider the special case where d{-) is the standard Gaussian density. In that case, 
while "^m is not easily invertible, an explicit expression can be derived for /~^, see Table 3. 
By using (22) in Remark 3.3, Theorems 3.1 and 3.2 lead to explicit upper bounds for the 
excess risk of the pFDR/FDR. By contrast with the bounds derived in Section 4.2, they are 
valid for any m > 2, but the quantity "log(grn/^m^)" is replaced by "logg^" (up to constant 
terms). The reason is that 7^ = (Fmi'^^iqm^Tm)) — Fm{'^^(qm^m)))+ involves a variation 
of qm around while Cm Fm{fm^{qmTm)) Fm{fm^{rm)) - Fm{fm{(lmTm)) involvcs a 
variation of qm around 1. When choosing qm oc g^^, this method inflates the upper-bound 
by a factor log log w.r.t. the bounds derived in Section 4.2. Hence, we have chosen to not 
report these bounds in the final manuscript. 
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Gaussian location 


Gaussian scale 


Parameter 

Fm{t) 
fm{t) 


exp(/im(^"\t) - Mm/2)) 

<l>((log W)/Mm +Mm/2) 
¥( (log gm) /Mm +^~^{Cm)) 


logTrn +loga.n = (^~\Cm /2)f (a^, - l)/2 
2^(^~\t/2)/(Jm) 

a-' exp{(l-a-2)(¥-'(V2))V2} 
2^{{2{\og{amu))al/{al-l)y/^) 

2¥(((^-\c^/2))2 + ^j^g^-y/'^ 



Table 3: Some calculations for the Gaussian location and scale models. $(x) = F(Z > x) for 
Z - A/'(0,1); t G (0,1); u > 0. 



6.4 Laplace location model 

According to Remark 2.1, our results do not cover the case of the Laplace location model 
because (l){u) = u -\- log 2 is not strictly convex. In this case, while the optimal classification 
procedures are still the thresholding procedures, Bayes threshold is or 1 whenever < e~^^ 
OT Tm ^ e^"^, respectively. This can be derived from the exact expression of provided in 
Proposition 25 of [ ] (item 3). Nevertheless, Bayes threshold is still unique in (0, 1) as soon 
as the parameters (T^,/irn) satisfy the constraint 

e"^"' <rm<e^^. (42) 

Moreover, this entails 1/2 < < 1 - e"^- /2, q^^ Cm/ {I Cm) and Rm{tm) 27ri,^(l - 
Cm)- In particular, one major difference with the cases considered in this paper is that qm^ 
does not tend to infinity under (BP) and (Sp). Also, we have = 1 as defined in (23). 
Under Assumption (42), Theorems 3.1 and 3.2 can be readily applied to obtain upper bounds 
for the excess risk of pFDR/FDR thresholding. While this proves that pFDR thresholding 
is still asymptotically optimal when choosing qm — (hn — o(\)^ we cannot derive directly 
such a statement for FDR thresholding. This comes from the fact that we used a "one-sided" 
concentration argument while bounding the type I error. Rather, we would need a "two-sided" 
concentration argument, which seems feasible but maybe technical. 

We have also performed numerical experiments for the Laplace location model, see Figure 3 
in the supplementary material [ ]. These experiments show that this model is somewhat 
singular: while the adaptation w.r.t. /3 is stronger than for the other models (ERR is even 
independent of /3 for pFDR thresholding), the sensitivity to the mis-specification of Cm is 
much higher. This behavior is in agreement with the expression of qm^ which involves Cm 
but not /3. 



6.5 Asymptotically minimax relative excess risk 

Let us denote the relative excess risk £m{im) = {Rm{im) — Rm{tm)) / Rm{tm) and consider 
the sparsity range Tm = m~^, 13 ^ for a subset B of (0, 1] containing at least two elements. 
Let us focus on the Laplace scale model. We showed in Section 4 that, under (BP), (Sp) and 
by taking am oc (logm)"-*^, there exists some constant D > such that for m > 2, 

sup {£m (t™(c^m))} < (43) 
(3eB log m 

Furthermore, (36) shows that the rate in (43) is not improvable over the class of pFDR 
procedures using an arbitrary nominal level am ^ (O?!)- An interesting open problem for 



27 



future research is to determine whether there exists a procedure tm achieving a faster rate 
than (logm)~^. We might conjecture that this is not the case, i.e., that there exists some 
constant D' > such that for m > 2, 

inf J sup{£m{im)} \ > (44) 

where the infimum is taken over any thresholding procedure tm • [0, 1]^ -> [0, 1] taking as 
input the value family. The latter, combined with (43), would show that FDR thresholding 
is asymptotically minimax in terms of relative excess risk. This would be more accurate than 
a result of the form (7) and is thus an interesting direction for future investigations. 

6.6 Case of other FDR controlling procedures 

The present paper focuses on the seminal FDR controlling procedure proposed by Benjamini 
and Hochberg [ ], which is based on Simes' line [29]. However, many other procedures have 
been proved to control FDR while they proposed some refinements over ['^], for instance, step- 
up-down procedures, see, e.g., [31, 26], procedures adaptive to TTo^m? see, e.g., [ , 27, 5, 14, 16], 
or procedures adaptive to the alternative c.d.f. F^, see [ ]. We believe that some of these 
procedures also have the property to be adaptive to unknown sparsity, and may outperform 
[ ] as a classification rule. This is an interesting avenue for future research. 

7 Proofs of Theorem 3.1 and Theorem 3.2 

7.1 Relations for pFDR 

Let us first state the following result. 

Proposition 7.1. Consider the setting and the notation of Theorem 3.1. Then we have 

1. for any m>2, 

2. if am < 1/2; we have for any m>2, 

Rmifm) - Rm{tm)) < ^l,mCm/qm "^^Mm + ^l,m(l q^)lm (46) 
Rm{Kn) - RmiC) < 7Ti^m{Cm/qm " Tmt^) V 7^. (47) 

Proof To prove (45), we use Fm{tm) — tmqm^m ^iid Tm — ^o,m/^i,mi to write 

RmiKn) ~ Rm{tm) = ^^,mKn ~ ^0,mtm + ^l,m{Cm " Fm{t^)) (48) 
= ^l,mFm{Kn) / qm " ^0,mtm + ^l,m{Cm — Fm{tm))- 

Expression (46) is an easy consequence of (45). Finally, (48) and (45) entail 

p \ _ U ^ / ^l.mCm/ qm ~ ^0,m^m — 

Um{t^) Km[t^) < I ^^^^^^^ _ if ^By^.^ , 

which yields (47) because TTo^mtm — '^i,mCm/Qm^ by definition. 

□ 
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7.2 Proof of Theorem 3.1 

Theorem 3.1 (i) follows from (47). Let us now prove (ii). First note that 

Rm{t*J = Tri,m - 7ri,„F^(4)(l - q-^). (49) 
Using (49) and the upper bound = Fm{t'^){qmTm)~^ ^ {QmTm)~^, we obtain 

> - (1 - q-') + Fm{ej) 

> 7ri,m(l - (1 - q~^)+Fm{q~^r-^)). 

This entails (19). 

7.3 Proof of Theorem 3.2 

Write tm instead of i^^^ for short. We prove the following oracle inequality, which is slightly 
more accurate than (21): for any m > 2, 



+^i,m7m A {7m + exp{-m62(T^ + l)-\Cm - 7m) /4}} • (50) 

Inequality (21) is a consequence of (50) where r^t^ has been lower-bounded by 0. 

To establish (50), let us first write the risk of FDR thresholding as Rm{tm) — 7i,m + 72,m5 
with Ti^m 7ro,mIE(^^m) and T2^m 7ri,^(l - E(F^(t^))). In the sequel, Ti,^ and T2,m are 
examined separately. 

7.3.1 Bounding Ti^rn 

The next result is a variation of Lemma 7.1 and Lemma 7.2 in [6]. 
Proposition 7.2. The following hound holds: 

m ^ ^rn , -1 /ri\ 



Proof. To prove Proposition 7.2, we follow the proof of Lemma 7.1 in [ ] with slight sim- 
plifications. Remember that we have by definition tm = V {am/m). Since OLm/'^ is 
deterministic and always smaller than the RHS in (51), and by integrating w.r.t. the label 
vector i7, it is sufficient to prove 

Let mi{H) = J^^i^i and mo{H) — m — mi{H). By exchangeability of {pi,Hi)i^ we can 
assume without loss of generality that the p-values corresponding to a label Hi — {) are 
Pi 5 •••iPmoiH) simplicity. Let us denote tjn,o the threshold of the step- up procedure applied 
to the values , . . . , Pmo (h) and using the critical values am {rui (H) + A:)/m, A: = l,..., mo (H) . 
That is. 
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where kjn,o = max{A: G {0, 1, ...,mo(i?)} : P(^k) ^ <^m(^i(^f) + k)/m}. A classical result in 
multiple testing is that tm^o is equal to the thresholding defined by (16), applied to the 
p- value family p^, 1 < z < m, in which each of the value Pmo{H)+ii •••^Pm has been replaced 
by (see, e.g.. Lemma 7.1 in [ ]). Moreover, since is non-increasing in each value, 
setting some values equal to can only increase i^^. This entails 

E(t^^ I H) < E{im,o I H) = am{mi{H) + E(A:^,o | H))/m. (53) 

Next, we may use Lemma 4.2 in [ ] (by taking "n — mo{H)^ /3 — o:^, r — am/m'''' with their 
notation), to derive that for any H G {0, 1}^, 

I m mo(i?)'^°^ Vmo(i?)-lV , • , 

^{km,^\H)^am • (mi(i7) + Z + l)z! — 

< <^m ^(mi(H) + z + l)al^ 

i>0 

= a^(mi(i/)/(l - am) + 1/(1 - amf). 
The bound (52) thus follows from (53). 

□ 

7.3.2 Bounding T2,m 

Let us consider tf^ the pFDR threshold associated to level a^7ro,m(l — ^)- Note that by 
definition of tL we have ttq 777, {l—£)Gm{t^) = tm/^rn- Here, we state the following proposition, 
which, combined with Proposition 7.2 establishes Theorem 3.2. 

Proposition 7.3. Let tf^ denote the pFDR threshold at level a^7ro,m(l — ^)- Then the 
following bounds hold: 

,m(l -f^m 

(am/m)); (54) 
T2,m < ^i,m(l - Fm{t'J) + ^1,^ exp{-m(T^ + l)~\Cm " 7m)^V4}. (55) 

To prove Proposition 7.3, let us first state the following lemma. 

Lemma 7.4. The following bound holds: 

ni^"" < 4) < exp{-mG„(4)£V4}. (56) 

We can show that Lemma 7.4 implies Proposition 7.3 as follows. First, (54) is an easy 
consequence of tm > am/Tn. Second, expression (55) derives from (56) because tm > t^ and 

Finally, we prove Lemma 7.4 by using a variation of the method described in the proof 
of Theorem 1 in [ ] (we use Bennett's inequality instead of Hoeff ding's inequality). For any 
to £ (0, 1) such that to/^m — Gm{to) < 0, we have 

< to) < P(G^(to) < to/am) 

< P(G^(to) - Gm{to) < to/am - Gm{to)). 
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Next, by using Bennett's inequality (see, e.g.. Proposition 2.8 in [ ]) and by letting h{u) 
(1 + u) log(l u) — u, for any u > 0, we obtain 



nig'' < to) < 



exp|-rnGUto)/.( G^(to) H' 



Finally, for to t^, since we have G^(t^) t^/«m = (1 - 7ro,m(l e:))G^(t^) > £:G^(t^), 
we obtain (56) by using that h{u) > v? for any u> 0. 

8 Proofs for location and scale models 
8.1 Proof of (A(F^,rJ) 

First, assume (A'(0)) and consider the location model: we easily check that 

Thus for t such that D ^{t) > fim^ we have log fm{t) = 0(i? ""^(t)) — 0(1? ""^(t) — /x^) > 
(j)\D (t) — firn)l^m^ by using the convexity of (j). Since lim+oo 0' = +oc, we obtain /m(O^) = 
+00. Fort such that < 0, -log/^(t) = (/)(-S"\t)+/x^)-(/)(-S"\t)) > (f)\-D~^ {t))firn- 

Hence we also have /^(1~) = 0. Furthermore, fm is decreasing because is strictly convex 
and increasing under (A' ((/))). 

Second, assume (A((/))) and consider the scale model. In this case, we have 

fm{t) = a-' eM<i>(D~\t/2)) - 4>{D-\t/2) / a^)] . 

Thus /m(l) = < 1- By using the convexity of 0, we have log{amfm{t)) = 0(^ ^{t/2)) — 

<P{D-\t/2)/am) > (1 - ct-1):D-' 
is decreasing because (j) is convex. 



d^{D \t/2)/cjm) > {l-(T;n')D \t/2)^'{D \t/2)/am). Hence /^(0+) = +00. Finally, /, 



8.2 Proof of Proposition 4.1 

Lemma 8.1. Consider the location model with a density d{x) — e^^^'^l^ for a function (j) 
satisfying (A'(0)). Then we have for any m>2, 

llm = r\^OgTm + (t>{\D~\Cm)\)) -D'^iCm) (57) 

^^^^r-y-^:^ (58) 



t^>r-i^^^^#^(l + ^(I?"'(C^) + /x^)) ' if 4> saUsfies {B{4>)) (59) 

R^{C) < 7ri,m (^^^^^^^ + l-C^y (60) 

// (BP) and (Sp) hold, we have t^ — O (TTi^rn/rl^^) and Rm(t^) ^ 7ri^rn(l — Cm). If addi- 
tionally {B{(j))) holds, we have Tjnt^ ^ lo?"^^^ • 
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Proof. First, since Cm Fm{t^) D{D ^{t^) — fJ^m), we have 

Tm = fmiO = exp{<l)i\D-\Cm) + flm\) - <f>{\D-\Cm)\)} 

Since > 1 and (f) is increasing, we get \D ''^(Cm)+Mm| ^ \D ^{Cm)\- Then, we note that for 
any a > and 6 G M, |6+a| > |6| holds only if a + 6 > 0. This provides that D ^{Cm)+f^m ^ 
and yields (57). Next, we have = F^^{Cm) = D{D ^{Cm) + Mm)- First, using (70), we 
obtain that < d(D~ (Cm) + ^m)/(i)'(D~ {Cm) + lim)- Since Tmd{D {Cm) + 
(i(i? {Cm))^ we obtain (58) and then (60). Second, if (j) satisfies {B{(j))) we can apply (72) 
to get (59). To finish the proof, we only have to prove that {B{(j))) implies that linioo ^ = 0; 
if {B{(f))) holds then linioo 0' exists in (0, oc] and thus h = is non-decreasing concave 

with a finite limit in oc. This entails that = ^'7(00^ tends to zero in oc. □ 

Lemma 8.2. Consider the scale model with a density d{x) = e"^*^'^') for a function (j) satis- 
fying (A(0)). Then, we have for any m>2, 

logr^ = -loga^ + <i){D~\c^l2)u^) - <p(D~\Cm/2)) (61) 

(Tm > r'(logr™ + (f>(D-\Cm/2)))/D-\Cm/2) (62) 

. 2.(g-;(C,,./2)) ,^3, 
am<l>'{D {Cm/2)am) 

C > ^^(^ '(^"-/^)) (i + f^^(D-\Cm/2)am)Y ^f <t> satisfies {B{4>)) 

(64) 

Rmit^J . f + 1 - (65) 



In particular, if (BP) and (Sp) /loW^ i(;e /lat'e logr^ ^ 0(1? (Cm/2)crm); t^ — O {ni^m/'^^ 



andRm{tm) ^ 7ri,^(l-C^). If additionally {B{(j))) holds, we havcTmtm ^ — 

am(P (D (Cm/2)0 

Proof First, since Cm Fm{tm) ^D{D~^ {t^/2)/am), we have 

Tm = /m(t^) = a-iexp{(/)(:D"\c^/2)a^) - 0(:D"'(C^/2))} 



and thus (61) holds. Since log cr^n > 0, we get (62). Next, using (70), we obtain that 
C = ^m'(Cm) = 2D{D~\Cm/2)am) < 2d{D~\Cm/2)am) / ^\D~\Cm/2)am) • Since we 
have amTmd{D {Cm/2)am) = d{D ((7^/2)) by (61) and by (62), we obtain (63), and then 
(65). Expression (64) is derived similarly by using (72). Finally, if (BP) and (Sp) holds, we 
obtain logr^ ^ (j){D ^ {Cm/2)am) by applying (61) and by noting that (f){x) — logx ^ (f){x) 
as X tends to infinity because (j){x)/x > (f)\l) > for x > 1. The remaining statements are 
then straightforward. □ 
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8.3 Proof of Corollary 4.3 

Proof for (i): from Theorem 3.1 (i), to show (28), we only have to prove that 7^ = (Cm — 
Fm(*mH9mTm)))+ Satisfies 

7m < Km(log(gm/C*) " log v)^lr^. (66) 

When < ^m^, this is trivial because 7^ = 0. Assume now Qm > Qm^ so that 7^ = 
Cm - Fm{^-\qmrm)) = Fm{^-\q?^'Tm)) " Fmi^-^qmTm)) > 0. To prove (66), we apply 
Lemma 8.3 (below) with 77^ — ^^(1 — ^) to get that, 

log — , > log 1/ + rm 

V ^'mOFm^(Cm) / ^m 

> log(gm/C*), (67) 

where the last inequality holds by assumption. We thus obtain 7^ < (7^(1 — ^) by inverting 
(67) because Qm/Qm^ — . We can thus apply Lemma 8.3 once again, this time 

for rim — ^m^ we obtain 

This implies (66). 

Proof for (ii): We apply Theorem 3.2. Let us prove (29) for a = 1. Let — (a^7ro,m(l — 
e))~^ - 1 < (Q^m7ro,m(l -^))"'^ and 7^ {Cm- Fm{^:^{qm^m))) + - From the same reasoning 
as for (i) above, we obtain 7^ < (7^(1 — ^) and 7^ < KmO-Og{q^/ qm^) — logu)^/rm because 
Vm > c^f^{\og{q'^/q^^) - \ogv). This yields (29) in the case a = L 

Now let us prove (29) for a — 2. First note that otml^ — ^^{flm^ra) where we let 

^Tu T:;^ma;^Fm{0Lmlm). HcnCC, 7^ {Cm - Fm{am/m))^ {Cm - Fm{'^m{QL^m)))+ • 

Assume am/'m < (otherwise 7^ = and the result is trivial). From the same reasoning 
as for (i), we can show 7^ < Km{^og{q'^/ qm^) — logz/)+/r^. Hence the result comes from 

< ^m^^C^mCm bccaUSC F^(Q:^/m) < Fm{t^) Cm- 

We now state and prove the Lemma 8.3. 

Lemma 8.3. Consider the setting of Corollary 4-3. Let rjm be such that < 77^ < C^(l — jy), 
for some v E (0, 1). Then, we have 



1 i^m^F^{Cm-^m)\ ^VrnTm .^^x 

^""S ^^^177^^ — > logzy + (68) 

^ ^mOFm {Cm) J 



Proof. Let us prove the location model (the scale case is similar). Let us fist note that 
the function —log I? is increasing on R and also convex on (0,+oc), because its second 
derivative on (0,+oc) is d x {—Dcj)' + d)/{D)'^ which is non-negative by (70). Next, since 
*m o F-\t) = t/D{D~\t) + fim). we have 

f'i^mOF-\Cm-Vm)\ , f Cm - Vm\ , (D{D~\Cm-Vm)+^m)\ 



'^mOFm\Cm) J \ Cm J \ D{D \Cm) ^ ^m) J 

> log Z/ + {D-\Cm - Vm) - D~\Cm)W{D~\Cm) + 
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by using that D ^{Cm) + /Xm > (as stated in Lemma 8.1), the convexity of —log I? on 
(0, +oc) and that the derivative d/D of —\ogD on (0, +oo) satisfies d/D > (\)' (by using 
again(70)). Finahy, since —D ^ is increasing and of derivative l/d{D ^{•)) > l/(i(0), we have 

D~\Cm-Vm)-D~\Cm) > Vm/d{0). Finahy notc that from (57), (D~\Cm) + f^m) = 
which gives the result. □ 

8.4 Proof of Corollary 4.4 

Let us prove (i). First note that ^ oc as soon as m ^ oc. The first claim in (i) easily 
derives from (28), because is larger than ^ ^fl^) (log Qm — log^) foi" large m if (31) holds 

and because q^^ > 1. Next, we prove the second claim only in the case of the location model 
(the scale case in similar). Assume {B{(j))) and (C('0)) for ip — (j)' o(j)~^. From above, we only 
have to prove that is not asymptotically optimal whenever (31) is not fulfilled. For this, we 
apply Theorem 3.1 (ii) and we prove that any regime for which (31) is violated leads to (20). 
Up to consider a subsequence, we can assume that Cm tends to some constant C G (0, 1). It 
is thus sufficient to prove that < C for = limsup^{(l — q^)j^Fjn{q^T:^)}. 
Let us first note that the following holds from (57): 



where qm ol^ - 1 and where we let Km ^ 'D \g^V^^) - ^(logr^ + (\){\D \c^)|)). 
Next, from {B((())) and (74), there exists a constant K > such that for any t small enough, 
D ^{t) > 0"-^ (log 1/t — log 00^0 (/)~^ (log 1/t) —log K). Also, from Appendix B, we can always 
assume that am < 1/2, i.e. qm ^ ^ foi" large m, and thus q'^r:^ necessarily converges to 
zero. Moreover, 6~ is increasing; and concave on M+, of derivative l/cj)' o 0-1. Thus we can 
write for m large enough, 

l^m >0"'^(logT^ + logg^ - logO(/)' O (/)"\logT^g^) - \0gK) - 0"^ ( log + (/)(|i:>"\Cm) |)) 
log qm - log 00' O (j)-^ {log Tmqm) -logK - (l){\D~^ {Cm)\ 



> 



-^{{logrm + log qm - log oc/)' O (j)-^ {log Tmqm) - log K) V {log Tm + (l){\D (C^)|))) 



We now use the latter bound in order to prove < C in any regime for which (31) is 
violated. 

- if am does not converges to 0: up to consider a subsequence, there is a- G (0, 1) 
such that am > o^- for m large enough. Hence log qm is bounded and we can use 
(C(^))to show that Km - -iQg^^^^^-Hiogr^gm) ^^^^^ ^^^^ rpj^j^ implies that < 

limsup^{(l - q-') + }C < (1 - a.)C < C. 

- if am and (log g'^)/r^'^ does not converges to zero: up to consider a subsequence, 
(logg^)/r^'^ converges to some ^ G (0, +oc]. First, if logg^ = o(logTm), we can use 
(C('0))to show liminf^A^^ > liminf^ ^^7^ = ^- Second, if (loggrn)/(logT^) does not 
converges to zero, it is larger than 5 G (0, +oc) for m large enough (up to consider a 
subsequence). Hence, we have logr^ < ^"■'^logg'^ for m large enough which entails 
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liminf^ A^^r? > liminf^ / ^^^^ ^. Moreover, the latter is bounded away from 

zero because cp' o (j)~^((S~^ + l)logg^) = O(logg^) by using Finally, in any 

case, we obtain that liminf^ K^n > and thus — D (^D ^{C) + liminf^ < C. 

This concludes the proof for (i). 

Let us now prove (ii). First, we consider the sparsity regime where m/rm > (logm)-^^^ 
for some 9 > 0. This condition implies that for any > 0, e"^^/'^'^ tends to zero faster 
than any power function t~^, A > 0. In particular, since by assumption ^^(x) = 0(e^^) for 
X +OC, e"^^/'^^ converges to zero faster than In the second sparsity regime where 

"^/^m ^ ^ ^ (0, +oc), we have m/rm which is a bounded sequence. Finally, in any of the two 
sparsity regimes, the result follows from Corollary 4.3 (ii), because Rm{t^) ^ ^m^{^ ~ C) and 
> am/m. 

8.5 Proof for Section 4.4 

In the Laplace case, some useful relations are reported in Table 4. Also remember that from 
Lemma 8.2, we have 

logr^ + logcr^ {am - l)log(l/C^), (69) 
that is, TjuCTni = Cj^^"^. Furthermore, under (BP) and (Sp), we have logr^ ^ log(l/C^)cr^. 





X + log 2 


Fm{t) 




d{x) 








D{x) 






^1/(<T-1-1) 


D-\u) 


-log(2n) 




(l/^,)l/(^™-l) 



Table 4: Some calculations for the Laplace scale model, x > 0; t G (0, 1); > 0; < 1/2. 
Let us first prove Proposition 4.5 and let us start by proving (33). By definition, Rmit^) 

Rm{tm) CmTTl^m (^l,m + ^2,m), whcrC Zi^^ ^mC ~^ ("i^ (qmTm)) t^) and ^2,^ 1 

C^^ Fmi"^^ {qm^m)) • On the one hand, since = {Cm)^'^ and using (69) twice, we get 



Zi,m = r^C-'^'- ( {Cm)-"- exp ( 

-if ( log (Im + log Tm + {(7m " 1) log 
= exp 

V V 1 - 

On the other hand, by using again (69), we obtain 

log qm + log Tm + {(7m " 1) log 



^2,m = 1 - exp 

= 1 — exp ( — 



7m 

log qm - logCTm ^ 
7m, — 1 
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This implies, by denoting — log^m — logc^m and by using the function g, 



CT^ - 1 \am.-lJJJ CTm^ — l V CTm " 1 

This leads to (33), because e~^"^ = cFm/Oim- 

Next, we can prove (34) by applying Theorem 3.2. By using the above computation of 
^2,m5 we have 

. (a ^^r.f log(«m /^m)-log(^0,m(l-^)) 

7m < 1 1 - exp ( - 

<Crr 



(log(g^/cr^) - log(7ro,m(l " ^)))_ 

am. — 1 



because for any G M, (1 — e ^)+ < u^. This gives (34) for a = 1. The case where a = 2 is 
similar: 

(1 - Cm Fm{a/m))+ V ^""^ V _ I J J 

(log(a-V^m) + log(m/T^)) 



< 



1 



by using (69). This finishes the proof of Proposition 4.5. 

Second, let us prove Corollary 4.6 and more specifically the equivalence (35). Assume (BP) 
and (Sp) and that log{qjn/o-m) has a limit in MU {— oc, +oc} (up to consider a subsequence). 
As both conditions entail that pFDR thresholding is asymptotically optimal, we can assume 
that Qm ^ oo and logg^n = o(log{ajn)) (see Corollary 4.4 (i)). Next, as g satisfies g{x) = O(x^) 
as X ^ 0; g{x) ^ x as x ^ +oc; g{logu) ^ l/^x as ^ 0, we easily check from (33) that the 
following holds: 

- if log(g^/cr^) 0, the relative excess risk tends to zero faster than l/(logT^); 

- if log(g^/cr^) i ^ R\{0}^ the relative excess risk is of order l/(logTm); 

- if log(g^/cr^) — oc or log(g^/a^) +oc, the relative excess risk tends to zero slower 
than l/(logT^); 

This entails (35). Finally, let us prove (36). First note that am ^ (log(l/C))~-'^/3 logm. 
Hence, if the limit in (36) is zero, we have from (35), V/3 G Qm ^ (log(l/C))~-'^/31ogm (up 
to consider a subsequence). This is impossible as soon as B contains more than two elements 
(because qm and the subsequence do not depend of /3). 
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A Expressions for tails and quantiles 



Lemma A.l. Let d{x) = e-'^d^D for any x E where (j) is a function satisfying A((j)). Then 
D{x) — e~^^^^^'^du has the following properties: 



• for any x > 0, we have 



D{x) < d{x)/(f)\x); 



(70) 



for any t G (0, 1/2) s.t. <j)'{D (t)) >1, we have -logt > (j){0) and 

D-\t)<r\-logt); 



(71) 



// additionally (j) satisfies {B ((/))) and by letting K = 1 + ^f(iy^ ^ following holds: 

• for any x > 0, 



, d(x) 



1 + 



(f)'{x 



(j)'{x) 



(72) 
(73) 



for any t G (0, ^(l)) s.t. (p'{D (t)) > 1, we have -logt > 4>{0) and 

D~^{t) > 0(0) V i -logt-logi4'-logo<?^'o(/)-i(-logt) 



(74) 



Proof. First note that (j)'{x) > in (70) because (j) is increasing and convex. Next, (70) 
holds because (j)' is nondecreasing: D{x) = e~^^^'^du < {(j)' {x))~^ (j)' {u)e~^^'^^ du = 

d{x)/(j)\x). Expression (71) follows from (70) applied with x — D ^(t). To prove (72), write 
for any x > 0, 



(l)\xY 



^-(t){u) 
7(^ 



(l)'{x) 



by using an integration by parts. Expressions (72) and (73) follow. Finally, let us prove 
(74). From (73), we get Kt^'{D~^{t)) > e-^(^"'W) and thus - log{Kt) - logo(l)\D~^ (t)) < 



(j){D (t)). Hence, we can conclude by using (72). 



□ 



B A sub-optimality result 

The next proposition states a sub-optimality result when choosing a recovery parameter 
Qm ^ < I5 that is, a level > a- > 1/2. 

Proposition B.l. Under Assumption (A(F^,t^))^ let us choose Qm ^ ^ (i.e., am > 1/2) 
in the pFDR threshold t^. Then we have for any m>2, 



Rm{t*J > Rm{C){Cm{l/qm - 1) + 1)- 



(75) 
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In particular, under (BP), if qm < q+ < 1 (i.e., Um > a- > 1/2), 

\iuimf{Rm{t*J/RmiO} > 1 

m 

and is not asymptotically optimal. 
Proof. First, since Fjn{t) = t^^^(t), 

< 7ro,mT^\ (76) 

because ^m(t^) > /m(^m) — from the concavity of F^. 

Second, assuming < 1, we have ^^^(t^) > > ^m^m ^(^m)- Hence t^ < t~^ and 
Fmif^m) > C'- By using (49), we get Rm{tm) ^ ^0,mT^nC(l/^m - 1) + 1), which, combined 
with (76), leads to (75). □ 
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